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1. Introduction 


Natural Hazards, such as earthquakes, have caused great human and economic losses; however, a 
widespread demand is emerged during recent years for predicting such events in order to prepare 
and diminish their effects on infrastructures properly. Literally, the concept of natural risk can be 
interpreted as a relative consciousness of society; nevertheless, there is no universally accepted 
definition for this issue [1]. Nonetheless, this issue can be technically defined as an extent to 
which the probability of the hazardous event and its potential implications are combined. 


The occurrence of natural hazards can be referred to when such a likely disaster turns in to a real 
one causing human and economic losses. For its severe effects, earthquake is recognized as the 
most devastating natural phenomena among others, such as tsunamis, floods, tornadoes, and 
volcanic eruption [2]. An earthquake occurs suddenly without any definite sign leading to loss of 
lives or injuries, buildings and infrastructure devastations, social and economic losses, and even 
environmental pollution [3]. Besides, numerous regions having a highly dense population are 
located within seismic areas. In addition to ground shaking, an earthquake can produce 
corresponding effects, such as liquefaction [4], landslide [5], and tsunamis [6]. 


Seismic risk can be defined as the composition of seismic vulnerability and seismic hazard [7]. 
In one aspect, seismic hazard describes a potentially seismic event causing damages and losses. 
In the other aspect, it is the potential vulnerability, which can show the extent of the destructive 
effect of hazard. 


Using the big datasets can assist in comprehending regression prediction models better; 
therefore, the significant data can improve the models for better prediction [8]. Increasing our 
knowledge in the earthquake can assist in improving the risk management and seismic hazard 


[9]. 


The present investigation forecasts the earthquake information in one region of Iran where has 
experienced numerous earthquakes since 2009 and has a large seismic information database. As 
large population growth has been observed in this area. Moreover, the earthquake is considered 
to be a major threat to urbanization and for the people who lived there. The time series models, 
such as ARIMA, GARCH, the combination of GARCH and ARIMA, are used in this 
investigation to predict the magnitude of possible earthquakes in this area. Although datasets 
used in previous studies were limited to serval Megabyte, Asencio-Cortes et al. employed 1GB 
datasets to predict an earthquake in California [10,11]. In addition, there are several useful 
studies, which focused on the regression and neural network (NN) models for predicting outputs 
[12-22]. They used several regressor models to predict an earthquake in the future for seven 
days. The novelty of this study is related to the prediction of earthquake magnitude by time series 
and the ensemble model, which is developed by two time series models. 


2. Methodology 


Three models are applied to achieve the aim of the earthquake magnitude prediction in the 
Zagros fault line, which is larger than 2.5(M=2.5). The earthquake dataset is acquired from the 
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seismological networks of the Institute of Geophysics at the University of Tehran (IGTU). Table 
1 demonstrates the most important features of the events catalog. According to Table 1, these 
data include 9017 earthquake events that occurred at three states of Iran during the time period 
from January 2009 to November 2018 with a magnitude of 2.5 or higher. 


Table 1 
Primary specifications of the original dataset used in the current study. 
Zone Ilam, Khuzestan and Bushehr states (Iran) 
Source Iranian Seismological Center (IGTU) 
Period 2009-2018 
Total Events 9017 


According to the catalog of the earthquake, statistical features of target variables, which are used 
for predicting earthquake magnitude, are presented in Table 2. In this dataset, the minimum 
longitude and latitude are 45.8, 28.47, respectively, and the maximum longitude and latitude are 
34.5 and 56.3, respectively. Specially, these longitude and latitude are referred to as west, south- 
west, south, and center of Iran. Figure 1 shows the regions, which their datasets are used to study 
in this investigation. 


Fig. 1. The studied region of events. 


Table 2 
Statistical characteristics of the analytical dataset used for the present regression study. 
Variable Min Max Mean Median Variance 
Magnitude 2.5 6.2 2.966 2.8 0.222 
Longitude 28.47 34.5 32.01 32.31 1.72 
Latitude 45.8 56.3 49.95 49.43 7.856 


In the first model, for predicting the next earthquake magnitude, autoregressive conditional 
heteroscedasticity (GARCH) with mean offset is generalized. The second model is the 
autoregressive integrated moving average (ARIMA). The third model is the combination of 
ARIMA and GARCH by multiple linear regression (MLR) technique. In other words, the outputs 
of ARIMA and GARCH models are imported as the inputs in the MLR model. The value of 
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earthquake magnitude for a time period between 2009 and 2016 has been imported as input 
parameters to these models for training the future magnitude. After the training, the calibrated 
models are achieved, and then these models are employed to predict the earthquake information 
during the years 2017-2018. Finally, the predicted values of earthquake magnitude accounted as 
outputs of models are assessed by comparing them with earthquake magnitudes recorded from 
2017 to November 2018. The statistical parameters, such as correlation coefficient, root mean 
square error (RMSE), normal mean square error (NMSE), and fractional bias, are calculated for 
evaluating the accuracy of the models. 


3. Results and discussion 


As above-mentioned, the data for a period of 2009-2016 have been applied as input for training 
the models. Afterward, for a period of 2017-2018, the predicted values are calculated by the 
calibrated models. 


The correlation coefficient between current earthquake magnitude and previous, the second 
pervious, the third previous, and the fourth previous earthquake magnitudes were evaluated, as 
shown in Table 3. 


Table. 3 
Correlation between earthquake data with previous four data's earthquake for magnitude, longitude, 
latitude and time range of earthquake occurrence. 


correlation between correlation between correlation between correlation between 
é earthquake data and earthquake data and earthquake data and earthquake data and 
S.N _ variable : 3 : : 
pervious earthquake second previous second previous second previous 
data earthquake data earthquake data earthquake data 
1 Magnitude 0.101 0.060 0.058 0.033 


According to the result of Table 3, the model ARIMA (1,1,1) and the GARCH (1, 1) model with 
mean offset 1 are chosen in this study. 


3.1. Prediction models 


The GARCH model having mean offset is utilized by the autoregressive moving average model 
to assume for the error variance, which the mean offset appears in the output as an additional 
parameter to be estimated or otherwise specified. As above-mentioned, the GARCH (1, 1) model 
with mean offset 1 is utilized for calculating the prediction of earthquake magnitude in Zagrous 
faults. The Box-Jenkins methodology refers to a set of procedures for identifying and estimating 
time series models within the class of ARIMA models [23]. Among different ARIMA models, 
ARIMA (1,1,1) has been chosen. The third model is the combination of two time series models, 
which outputs of models 1 and 2 are imported as the input variables in the third model. 


3.2. Prediction of magnitude 


Magnitude is the scale to describe the overall strength or size of the earthquake. Fig 2 shows the 
comparison of the predicted magnitudes and the recorded magnitudes for each method. Fig 2 (a) 
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shows that the results of the predicted magnitude by GARCH are not acceptable. As shown in fig 
2. (b), the outputs of ARIMA demonstrate that the predicted magnitudes are close to the recorded 
values; hence, the ARIMA model can predict acceptable values. According to fig. 2 (c), the 
predicted values of model 3 are better than model 2; therefore, it seems that model 3 can predict 
magnitude better than model 2. 
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(c) 
Fig. 2. Comparison of actual and predicted magnitude values for each regression model. 


Table 4 shows the statistical parameters that are calculated by using the predicted and recorded 
magnitude. These parameters can evaluate the accuracy of models for predicting magnitude. 
According to the results of fractional bias, all models are under-predicting. The correlation 
coefficient of GARCH is smaller than 0.005, and the value shows that model | cannot predict 
earthquake magnitude; however, the correlation coefficient of ARIMA and model 3 is 0.9852 and 
0.9946, respectively. As a result, models 2 and 3 demonstrate acceptable results. The RMSE and 
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NMSE of models 2 and 3 are obtained 0.0565, 0.0186, and 0.0339, 0.0116, respectively; 
therefore, model 3 shows better results in comparison to model 2 with respect to the RMSE and 
NMSE. In addition, model 3 performs better results than model 2 with respect to the correlation 
coefficient. These parameters show that the models 2 and 3 have been verified and performed to 
be acceptable in predicting magnitude, although Model 3 can predict earthquake magnitude more 
accurate than model 2. According to the comparison between the predicted and actual magnitude 
by Fig. 2 and statistical parameters, the results of model 3are better than model 2. 


Table. 4 
Comparison of predicted and monitored magnitude values in years 2009-2016 and years 2017 and 2018. 
‘ 2009-2016 2017-2018 
S.N eee Correlation Fractional Correlation Fractional 
; thod 
cada Coefficient Re ee Bias Coefficient RMoE ee Bias 
GARCH 0.0182 0.4730 0.8544 -2.43E-08 0.0019 0.4612 1.3647 1.21E-07 
2 ARIMA 0.9553 0.1009 0.0248 -3.87E-08 0.9852 0.0565 0.0186 1.86E-07 
The 
3 ensemble 0.9925 0.0413 0.0106 -2.98E-09 0.9946 0.0339 0.0116 1.44E-08 
model 


To develop model 3, the outputs of models 1 and 2 are required; therefore, equations of models 
1, 2, and 3 are important to predict the magnitude of the earthquake. The equations (1), (2), and 
(3) are presented to predict the values of magnitude, which referred to models 1, 2, and 3, 
respectively. They are formulated as below: 


M_, = 2.09236 +0.39546y,_, + 0.09272¢. , (1) 
M , =-8.6507x10° +a, + 0.07257a,_, - 0.987046, , (2) 
M _, =0.28281G +0.9976A +0.20481 3) 


Where €; and pu are typically assumed to be "white noise" and "mean offset"; 1.¢., it is identically 
and independently distributed with a common mean 0 and common variance o* across all 
observations. y;_, and @,_, are the last magnitudes. G and A are defined as outputs of models 1 
and 2, respectively. 


The results of ARIMA demonstrate acceptable prediction of magnitude and duration of the 
earthquake. The ARIMA minimizes the error of the prediction model, although the GARCH 
minimizes the variance of the prediction model. Based on the ability of the GARCH and ARIMA 
models, the combination of ARIMA and GARCH by the MLR can assist in obtaining a better 
equation in comparison to the ARIMA or GARCH. 


4. Conclusions 


The main aim of this study is related to develop the time series models for predicting the 
earthquake magnitude. Earthquake events along Zagrous fault from 2009 to 2018, which their 
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magnitude is more than 2.5, have been employed in this study. The earthquake dataset was 
acquired from the seismological networks of the Institute of Geophysics at the University of 
Tehran (IGTU). Two time series and the ensemble model have been utilized to predict the future 
earthquake magnitude. The results of this study are summarized as follows: 


e The results demonstrate that ARIMA is an acceptable model to predict the earthquake 
magnitude. 

e Model 3, which is the combination of ARIMA and GRCH, can predict the earthquake 
magnitude better than ARIMA. This issue demonstrates that the model 3 algorithm is 
stronger to predict the earthquake magnitude in Zagrous fault. 

e Any physics-related problem is not applied to the models to predict earthquake magnitude. 
These proposed models are statistical models, and they are based on time series 
formulation. 

e The time series models can be combined with deterministic models such as Gaussian 
plume or Eulerian models, which perhaps can improve the models. 


Appendix A 


Model 1: GARCH 


The first model is the GARCH method with mean offset. The GARCH is achieved from the 
autoregressive conditional heteroscedasticity (ARCH). On the other hand, the GARCH was 
created by developing the ARCH method. The ARCH method is the statistical method, which 
uses time-series data. This method describes the variance of the current error term or innovation 
as an equation of the actual sizes of previous time periods error term. The variance is related to 
the squares of the previous innovation in this method. The GARCH (p, q) model is calculated by 
the following equations. P is the order of the arch term e”. u is the mean of offset in the GARCH 
model. The mean offset appears in the output as an additional parameter to be estimated, or 
otherwise specified. 


y, =xbt+ ute, (Al) 

é,/y,_,~N(0,o7) (A2) 
q q 

O, = O48), +..40,6,,+6,0,4+..+8,0,,=O+ Ye; + Dy lone (A3) 
i=l i=l 


Where a0, ot >0, (B1+a1)<1. 


The lag length p of a GARCH (p, q) process is established by three steps, in which the first one is 
to estimate the best fitting AR (q) model, the second one is to compute and plot the auto 
correlations of €2, and the last step is the asymptotic process. The asymptotic used for large 


samples standard deviation of p(i) is 7 and individual values are larger than the indication of 


GARCH errors. For estimating the total number of lags, the Ljung-Box test is utilized. This 
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process is recommended to consider up to values of n. There are different methods to estimate o,° 
in the literature. Many authors have chosen models within the GARCH family [24]. The simplest 
of these models is the GARCH (1, 1) model [25], which is calculated by equation (A3). In 
addition, the GARCH model has been implemented to estimate the conditional variance. In the 
present study, the GARCH (1, 1) model has been applied for simplicity, but the other GARCH 
prediction models could be utilized as well in future studies. 


Model 2: ARIMA 


Based on the above-mentioned consideration, the second model is ARIMA in the present study. 
One of the approaches to analyze the time series data is the Box-Jenkins ARIMA method, which 
includes exploiting the foreseeable behavior from the monitored values [23]. To explain clearly, 
the ARIMA procedure of order (p, d, q) is demonstrated by equation (A4), which is formulated 
as: 


Dp q 
W, => dW, +a,->0.0,_; (A4) 
i=l j=l 


In which W, = Vy; and d is the order of differencing, V is defined as the reversed difference 
operator, p is the order of autoregressive procedure, and q is the order of moving average 
procedure. ¢; and 6; are defined as i autoregressive variables, j** moving average variables, 
respectively. y; and a; are the recorded value at time t and the error parameter at time t. In this 
method, the correlation coefficient between the current earthquake data and the first, the second, 
the third, and the fourth previous earthquake data are calculated to select p of the ARIMA model. 


Model 3: The combination of ARIMA and GARCH with MLR 


The first researchers who defined a combination method instead of one single method were, are 
Bates and Granger [26]. The opinion of combining predictive algorithms is to keep each model 
property and to get various properties in the dataset. The model 3 is the combination of ARIMA 
and GARCH with the MLR method. In other words, the outputs of ARIMA and GARCH with 
mean offset are independent variables, e is an estimated error term, which is calculated from 
independent random sampling from the normal distribution with mean zero and constant 
variance. According to the equation (A5), the minimum square error technique can be used to 
determine the b1, b2, b3. This solution can be generated as b=[X'X]'[X'Y]. The formula of 
model 3 is calculated as: 


Y =b,+b,X,+b,X,+e (AS) 
where X' is the transpose of X. 
Appendix B 


The statistical parameters that have been presented by Chang and Hanna are used for assessing 
the models in the present study [27]. Correlation of coefficient (R) is one of the used statistical 
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parameters, which the strength of the relationship between the predicted and relative actual 
values determines. In other words, the association between the predicted and observed values is 
calculated by this parameter that can be calculated by equation (B1). In is clear that the value of 
the correlation coefficient is between -1 and 1. If the amount of R is determined -1 and 1, there is 
a perfect relationship between predicted and actual values. In contrast, a correlation with 0 value 
demonstrates no relationship between the movement of the predicted and actual values. 


n(>€,C,)-OlC)O“C,) (B1) 


{LC O e162 = O06.) 1 


In which n, Cy and C, are defined as number in the given dataset, the observed earthquake 
magnitude, and predicted earthquake magnitude, respectively. 


Root Mean Square Error (RMSE) is another statistical parameter, which is used in the present 
study for measuring the accuracy of all models. This parameter shows the differences between 
the predicted and recorded values, and can be calculated as follows: 


(B2) 


Another statistical parameter is Normalized Mean Square Error (NMSE), which is used for 
assessing the validity of all models in this study. This parameter shows the scatter in the all data 
set, and can be determined as below: 


NMSE = — (B3) 


oO 


Where C o is the mean of the recorded values. It is clear that the optimal value of the NMSE is 
zero. 


Fraction Bias (FB) is a statistical parameter, which is defined as the normalized mean 
concentrations. This parameter determines that the model is under-predicting or over-predicting. 
This parameter can be determined as follows: 


(Co—-C p) 


ee ae B4 
0.5(C 0 +C p) a 


Where C p is the mean of the forecasted amounts. 
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